Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2308.12966

Woodpecker: Hallucination Correction for Multimodal Large Language Models

Paper • 2310.16045 • Published Oct 24, 2023 • 17
HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models

Paper • 2310.14566 • Published Oct 23, 2023 • 27
SILC: Improving Vision Language Pretraining with Self-Distillation

Paper • 2310.13355 • Published Oct 20, 2023 • 9
Conditional Diffusion Distillation

Paper • 2310.01407 • Published Oct 2, 2023 • 20

Top Vision-Language Papers 🖼️💬📝

A curated list of papers on vision-language models, with the most influential ones at the top.

Improved Baselines with Visual Instruction Tuning

Paper • 2310.03744 • Published Oct 5, 2023 • 37
DeepSeek-VL: Towards Real-World Vision-Language Understanding

Paper • 2403.05525 • Published Mar 8, 2024 • 47
Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities

Paper • 2308.12966 • Published Aug 24, 2023 • 8
LLaVA-Gemma: Accelerating Multimodal Foundation Models with a Compact Language Model

Paper • 2404.01331 • Published Mar 29, 2024 • 28

Paused

2.09k

2.09k

Anycoder

🏢

Select and view code snippets for different providers
Running

273

273

Qwen2.5 Coder Artifacts

🐢

Generate application code with Qwen2.5-Coder-32B
Running

916

916

QwQ-32B-Preview

🔍

QwQ-32B-Preview
Running on CPU Upgrade

13.1k

13.1k

Open LLM Leaderboard

🏆

Track, rank and evaluate open LLMs and chatbots

Multimodal Papers

Woodpecker: Hallucination Correction for Multimodal Large Language Models

Paper • 2310.16045 • Published Oct 24, 2023 • 17
SILC: Improving Vision Language Pretraining with Self-Distillation

Paper • 2310.13355 • Published Oct 20, 2023 • 9
To See is to Believe: Prompting GPT-4V for Better Visual Instruction Tuning

Paper • 2311.07574 • Published Nov 13, 2023 • 16
MyVLM: Personalizing VLMs for User-Specific Queries

Paper • 2403.14599 • Published Mar 21, 2024 • 17

Vision Language Models Papers 🖼️💬📝

Papers about vision-language models, most important ones are on top of the list.

Improved Baselines with Visual Instruction Tuning

Paper • 2310.03744 • Published Oct 5, 2023 • 37
DeepSeek-VL: Towards Real-World Vision-Language Understanding

Paper • 2403.05525 • Published Mar 8, 2024 • 47
Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities

Paper • 2308.12966 • Published Aug 24, 2023 • 8
LLaVA-Gemma: Accelerating Multimodal Foundation Models with a Compact Language Model

Paper • 2404.01331 • Published Mar 29, 2024 • 28

Vision-Language Model

Visual Instruction Tuning

Paper • 2304.08485 • Published Apr 17, 2023 • 16
Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities

Paper • 2308.12966 • Published Aug 24, 2023 • 8
Improved Baselines with Visual Instruction Tuning

Paper • 2310.03744 • Published Oct 5, 2023 • 37
SILC: Improving Vision Language Pretraining with Self-Distillation

Paper • 2310.13355 • Published Oct 20, 2023 • 9

Zero-Shot and Few-Shot Video Question Answering with Multi-Modal Prompts

Paper • 2309.15915 • Published Sep 27, 2023 • 2
Reformulating Vision-Language Foundation Models and Datasets Towards Universal Multimodal Assistants

Paper • 2310.00653 • Published Oct 1, 2023 • 3
Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities

Paper • 2308.12966 • Published Aug 24, 2023 • 8
An Empirical Study of Scaling Instruct-Tuned Large Multimodal Models

Paper • 2309.09958 • Published Sep 18, 2023 • 19

Qwen Technical Report

Paper • 2309.16609 • Published Sep 28, 2023 • 36
Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities

Paper • 2308.12966 • Published Aug 24, 2023 • 8
Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models

Paper • 2311.07919 • Published Nov 14, 2023 • 10
Running

197

197

Qwen-VL-Plus

📷

Chat with images and text using Qwen-VL-Plus

Papers - Multimodal

TinyLLaVA: A Framework of Small-scale Large Multimodal Models

Paper • 2402.14289 • Published Feb 22, 2024 • 21
ImageBind: One Embedding Space To Bind Them All

Paper • 2305.05665 • Published May 9, 2023 • 5
DocLLM: A layout-aware generative language model for multimodal document understanding

Paper • 2401.00908 • Published Dec 31, 2023 • 187
Multimodal Contrastive Learning with LIMoE: the Language-Image Mixture of Experts

Paper • 2206.02770 • Published Jun 6, 2022 • 3

MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature Synchronizer

Paper • 2401.10208 • Published Jan 18, 2024 • 1
ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities

Paper • 2305.11172 • Published May 18, 2023 • 1
mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video

Paper • 2302.00402 • Published Feb 1, 2023
Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities

Paper • 2308.12966 • Published Aug 24, 2023 • 8

Previous
1
2
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs